Learning from Uncertain Data

نویسنده

  • Mehryar Mohri
چکیده

The application of statistical methods to natural language processing has been remarkably successful over the past two decades. But, to deal with recent problems arising in this field, machine learning techniques must be generalized to deal with uncertain data, or datasets whose elements are distributions over sequences, such as weighted automata. This paper reviews a number of recent results related to this question. We discuss how to compute efficiently basic statistics from a weighted automaton such as the expected count of an arbitrary sequence and higher moments of that distribution, by using weighted transducers. Both the corresponding transducers and related algorithms are described. We show how general classification techniques such as Support Vector Machines can be extended to deal with distributions by using general kernels between weighted automata. We describe several examples of positive definite kernels between weighted automata such as kernels based on counts of common n-gram sequences, counts of common factors or suffixes, or other more complex kernels, and describe a general algorithm for computing them efficiently. We also demonstrate how machine learning techniques such as clustering based on the edit-distance can be extended to deal with unweighted and weighted automata representing distributions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stable Rough Extreme Learning Machines for the Identification of Uncertain Continuous-Time Nonlinear Systems

‎Rough extreme learning machines (RELMs) are rough-neural networks with one hidden layer where the parameters between the inputs and hidden neurons are arbitrarily chosen and never updated‎. ‎In this paper‎, ‎we propose RELMs with a stable online learning algorithm for the identification of continuous-time nonlinear systems in the presence of noises and uncertainties‎, ‎and we prove the global ...

متن کامل

Bayesian Network Structure Learning from Attribute Uncertain Data

In recent years there has been a growing interest in Bayesian Network learning from uncertain data. While many researchers focus on Bayesian Network learning from data with tuple uncertainty, Bayesian Network structure learning from data with attribute uncertainty gets little attention. In this paper we make a clear definition of attribute uncertain data and Bayesian Network Learning problem fr...

متن کامل

One-Class-Based Uncertain Data Stream Learning

This paper presents a novel approach to one-class-based uncertain data stream learning. Our proposed approach works in three steps. Firstly, we put forward a local kerneldensity-based method to generate a bound score for each instance, which refines the location of the corresponding instance. Secondly, we construct an uncertain one-class classifier by incorporating the generated bound score int...

متن کامل

Online active learning of decision trees with evidential data

Learning from uncertain data has been drawing increasing attention in recent years. In this paper, we propose a tree induction approach which can not only handle uncertain data, but also furthermore reduce epistemic uncertainty by querying the most valuable uncertain instances within the learning procedure. We extend classical decision trees to the framework of belief functions to deal with a v...

متن کامل

Adaptive Approximation-Based Control for Uncertain Nonlinear Systems With Unknown Dead-Zone Using Minimal Learning Parameter Algorithm

This paper proposes an adaptive approximation-based controller for uncertain strict-feedback nonlinear systems with unknown dead-zone nonlinearity. Dead-zone constraint is represented as a combination of a linear system with a disturbance-like term. This work invokes neural networks (NNs) as a linear-in-parameter approximator to model uncertain nonlinear functions that appear in virtual and act...

متن کامل

Learning DLP from Uncertain Data

Description Logic Programs (DLP) is an expressive but tractable subset of OWL. In this paper, we study a rising but under-researched problem of learning DLP from uncertain data. Current research rarely explores the plentiful uncertain data populating the Semantic Web. We handle uncertain data in Inductive Logic Programming (ILP) framework by modifying the performance evaluation criteria. We ado...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003